Click below to select which one you want to view.
This study sought to analyze the behavior of IFIX. To this end, historical data was collected from January 2015 to September 2023. Additionally, correlation with the IBOV, Selic and IPCA indices was explored. Subsequently, the ARIMA statistical model was used, which combines elements of autocorrelation (AR) and moving averages (MA) to capture temporal patterns in the data. It was concluded that a longer period and shorter intervals would provide robustness to future studies. This statement is evident due to the context experienced in the period in question, multiple governments, impeachment, pandemic, amplitude of the Selic rate, crises… Even so, the ARIMA model identified trends and prediction patterns, capable of being observed when analyzing “out-of-sample” experiment. Furthermore, the present work successfully fulfilled its objectives. Oriented under a didactic, exploratory, speculative bias and in order to demonstrate the author’s repertoire on the themes in question.
It is assumed that a good investment is made at the appropriate place, time and value. It was proposed here to look into the behavior of the IFIX (A Brazillian Real Estate Investment Fund Index) on an exploratory basis in order to analyze the intrinsic relationship of this index.
To this end, correlation with other indices was analyzed. In particular IBOV (or IBOVESPA - São Paulo Stock Exchange Index), Selic (Special Settlement and Custody System) and IPCA (Broad National Consumer Price Index). The period analyzed was from January 2015 to September 2023, with a monthly interval.
Secondly, the ARIMA statistical model was used, which combines elements of autocorrelation (AR) and moving averages (MA) to capture temporal patterns in the data.
Possui como objetivos gerais:
Its general objectives are:
Its specific objectives are:
The database was formed with its sources, the Selic and IPCA indices
provided by BACEN (Brazilian Central Bank) and the IFIX and IPCA indices
were collected by the TvDataFeed library, which has the
TradingView website as its source. More details about the
methodology can be obtained in the topic Replicability
and throughout the work itself.
“This material (…) is not a recommendation to buy or sell financial assets and products. Any investment decision must be made through professional and personalized advice, taking into account the objectives, needs and financial situation specific to each investor.”
(Guidance on disclosing investment information. Comissão de Valores Mobiliários - CVM)
To display the libraries and tools used, click the button:
R Libraries:
library(tidyverse)
library(plotly)
library(lubridate)
library(tibble)
library(jsonlite)
library(httr)
library(reticulate)
library(dplyr)
library(tseries)
library(forecast)Python Libraries:
Take, as an example, the line of code httr::GET(url).
Although the reference httr:: is unnecessary, it was made
explicit throughout this work for personal and good practice reasons,
such as organization and referencing the origin of the function
used.
In this topic, the versions were detailed to enable replication of
the results obtained. For more details, click the button:
If you want to replicate, make sure you have the necessary packages by running the following code:
library <- c("tidyverse",
"plotly",
"lubridate",
"tibble",
"jsonlite",
"httr",
"reticulate",
"dplyr",
"tseries",
"forecast")
if (sum(as.numeric(!library %in% installed.packages())) != 0) {
instalador <- library[!library %in% installed.packages()]
for (i in 1:length(instalador)) {
install.packages(instalador, dependencies = T)
break()}
sapply(library, require, character = T)
} else {
sapply(library, require, character = T)
}## tidyverse plotly lubridate tibble jsonlite httr reticulate
## TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## dplyr tseries forecast
## TRUE TRUE TRUE
The forked version of TvDataFeed was also used. To install:
!pip install --upgrade --no-cache-dir git+https://github.com/rongardF/tvdatafeed.git
The current version of R, Python and Reticulate.
pver = (__import__('sys').version)[0:6]
print(" Reticulate version", r.rrtver,"\n Python version", pver,"\n", r.rver)## Reticulate version 1.32.0
## Python version 3.9.16
## R version 4.3.1 (2023-06-16)
The packages and versions used were:
pckgver <- for (package_name in sort(loadedNamespaces())) {print(paste(package_name, packageVersion(package_name)))}## [1] "base 4.3.1"
## [1] "bslib 0.5.0"
## [1] "cachem 1.0.8"
## [1] "cli 3.6.1"
## [1] "colorspace 2.1.0"
## [1] "compiler 4.3.1"
## [1] "curl 5.0.0"
## [1] "data.table 1.14.8"
## [1] "datasets 4.3.1"
## [1] "digest 0.6.33"
## [1] "dplyr 1.1.3"
## [1] "evaluate 0.21"
## [1] "fansi 1.0.4"
## [1] "fastmap 1.1.1"
## [1] "forcats 1.0.0"
## [1] "forecast 8.21.1"
## [1] "fracdiff 1.5.2"
## [1] "generics 0.1.3"
## [1] "ggplot2 3.4.2"
## [1] "glue 1.6.2"
## [1] "graphics 4.3.1"
## [1] "grDevices 4.3.1"
## [1] "grid 4.3.1"
## [1] "gtable 0.3.3"
## [1] "here 1.0.1"
## [1] "hms 1.1.3"
## [1] "htmltools 0.5.6"
## [1] "htmlwidgets 1.6.2"
## [1] "httr 1.4.6"
## [1] "jquerylib 0.1.4"
## [1] "jsonlite 1.8.7"
## [1] "knitr 1.43"
## [1] "lattice 0.21.8"
## [1] "lazyeval 0.2.2"
## [1] "lifecycle 1.0.3"
## [1] "lmtest 0.9.40"
## [1] "lubridate 1.9.2"
## [1] "magrittr 2.0.3"
## [1] "Matrix 1.6.0"
## [1] "methods 4.3.1"
## [1] "munsell 0.5.0"
## [1] "nlme 3.1.162"
## [1] "nnet 7.3.19"
## [1] "parallel 4.3.1"
## [1] "pillar 1.9.0"
## [1] "pkgconfig 2.0.3"
## [1] "plotly 4.10.2"
## [1] "png 0.1.8"
## [1] "purrr 1.0.2"
## [1] "quadprog 1.5.8"
## [1] "quantmod 0.4.25"
## [1] "R6 2.5.1"
## [1] "Rcpp 1.0.11"
## [1] "readr 2.1.4"
## [1] "reticulate 1.32.0"
## [1] "rlang 1.1.1"
## [1] "rmarkdown 2.23"
## [1] "rprojroot 2.0.3"
## [1] "rstudioapi 0.14"
## [1] "sass 0.4.7"
## [1] "scales 1.2.1"
## [1] "stats 4.3.1"
## [1] "stringi 1.7.12"
## [1] "stringr 1.5.0"
## [1] "tibble 3.2.1"
## [1] "tidyr 1.3.0"
## [1] "tidyselect 1.2.0"
## [1] "tidyverse 2.0.0"
## [1] "timechange 0.2.0"
## [1] "timeDate 4022.108"
## [1] "tools 4.3.1"
## [1] "tseries 0.10.54"
## [1] "TTR 0.24.3"
## [1] "tzdb 0.4.0"
## [1] "urca 1.3.3"
## [1] "utf8 1.2.3"
## [1] "utils 4.3.1"
## [1] "vctrs 0.6.3"
## [1] "viridisLite 0.4.2"
## [1] "withr 2.5.0"
## [1] "xfun 0.40"
## [1] "xts 0.13.1"
## [1] "yaml 2.3.7"
## [1] "zoo 1.8.12"
Finally, setting the seed to fixed/replicable values in eventual random generation.
To display details about the acquisition of Data Frames
click on the button:
Initial parameter definition:
If necessary, import credentials from a *.json file into the Python environment.
# Definição do Usuário/Senha (Para aquisição do IBOV e IFIX).
#tradingview <- jsonlite::fromJSON("tradingview.json")
#py$username <- tradingview$username
#py$password <- tradingview$passwordThe TvDataFeed library allows you to perform queries
without being logged in, however it will be limited in certain requests.
Although Login was not necessary in this work, the possibility
of logging in using a json file containing username and password was
made available in the code above.
The main libraries (such as quantmod,
tidyquant, BatchGetSymbols,
investpy, …) “pull” data from Yahoo Finance, which
is incomplete for the IFIX index. Also, some of these libraries were
discontinued, such as Google Finance in March 2018, and the
Investing website. Alpha Vantage is also not a viable
option. With this in mind, the fork library of
TvDataFeed was used below, which receives data from the
TradingView website. In this part, Python will be
used, and later returned to R.
## you are using nologin method, data you access may be limited
For this particular study, no login is required as previously mentioned.
# São 107 'n_bars' por ser a quantidade de meses para que inicie em 12/2014.
ifix = tv.get_hist(symbol='IFIX',exchange='BMFBOVESPA',interval=Interval.in_monthly,n_bars=106)
ibov = tv.get_hist(symbol='IBOV',exchange='BMFBOVESPA',interval=Interval.in_monthly,n_bars=106)Receiving and displaying the Data Frame from IFIX in the R environment:
The following indices were acquired through a request via the BACEN (Brazilian Central Bank) API:
https://api.bcb.gov.br/dados/serie/bcdata.sgs.{codigo_serie}/dados?formato=json&dataInicial={dataInicial}&dataFinal={dataFinal}
For more details check the References section.
# Parâmetros iniciais para aquisição do IPCA e Selic.
dataInicial <- "01/01/2015"
dataFinal <- "01/09/2023"
formato <- "json"
cod_ipca <- "10844"
cod_selic <- "4390"Setting path:
ipca_url <- sprintf("https://api.bcb.gov.br/dados/serie/bcdata.sgs.%s/dados?formato=%s&dataInicial=%s&dataFinal=%s", cod_ipca, formato, dataInicial, dataFinal)
cat(ipca_url)## https://api.bcb.gov.br/dados/serie/bcdata.sgs.10844/dados?formato=json&dataInicial=01/01/2015&dataFinal=01/09/2023
Performing request and transforming _*.json_ content into Data Frame to be worked on:
Setting path:
selic_url <- sprintf("https://api.bcb.gov.br/dados/serie/bcdata.sgs.%s/dados?formato=%s&dataInicial=%s&dataFinal=%s", cod_selic, formato, dataInicial, dataFinal)
cat(selic_url)## https://api.bcb.gov.br/dados/serie/bcdata.sgs.4390/dados?formato=json&dataInicial=01/01/2015&dataFinal=01/09/2023
Performing request and transforming _*.json_ content into DataFrame to be worked on:
The present study worked with four Data Frames, containing data on IFIX, IBOV, IPCA and Selic indices. Organized temporally into months. The “raw” data can be viewed below:
Click above to select which one you want to view.
Cleaning and standardization was carried out as follows:
open) was chosen.Finally, the data was consolidated into a single table as shown below:
At this point, data cleaning and standardization was carried out.
Click above to select which one you want to view.
# Index (data) para coluna
ifix_df$timestamp <- row.names(ifix_df)
# Formatando data e padronizando dias para 01 ("normalizando" dias úteis)
ifix_df$timestamp <- as.Date(ifix_df$timestamp, format = "%Y-%m-%d")
ifix_df$timestamp <- lubridate::floor_date(ifix_df$timestamp, unit = "month") + days(0)
# Resetando numeração da coluna (index)
ifix_df <- data.frame(ifix_df, row.names = NULL)
# Transformando valores em variações percentuais
ifix_df <- dplyr::mutate(ifix_df, IFIX = ((ifix_df$open - lag(ifix_df$open, default = dplyr::first(ifix_df$open)))/lag(ifix_df$open, default = dplyr::first(ifix_df$open))) * 100)
# Selecionando colunas de interesse
ifix_df <- subset(ifix_df, select = c(timestamp, IFIX))
# Excluindo primeira linha de 12/2014
ifix_df <- dplyr::slice(ifix_df, -1)
# Arredondando decimais para duas casas
ifix_df$IFIX <- round(ifix_df$IFIX, 2)
# Transformando em Tibble Data Frame
ifix_df <- as_tibble(ifix_df)The variation was obtained using the formula: \[Variation = \dfrac {Later Value- Previous Value}{Previous Value} \times 100\]
After standardization, we have:
# Index (data) para coluna
ibov_df$timestamp <- row.names(ibov_df)
# Formatando data e padronizando dias para 01 ("normalizando" dias úteis)
ibov_df$timestamp <- as.Date(ibov_df$timestamp, format = "%Y-%m-%d")
ibov_df$timestamp <- lubridate::floor_date(ibov_df$timestamp, unit = "month") + days(0)
# Resetando numeração da coluna (index)
ibov_df <- data.frame(ibov_df, row.names = NULL)
# Transformando valores em variações percentuais
ibov_df <- dplyr::mutate(ibov_df, IBOV = ((ibov_df$open - lag(ibov_df$open, default = dplyr::first(ibov_df$open)))/lag(ibov_df$open, default = dplyr::first(ibov_df$open))) * 100)
# Selecionando colunas de interesse
ibov_df <- subset(ibov_df, select = c(timestamp, IBOV))
# Excluindo primeira linha de 12/2014
ibov_df <- dplyr::slice(ibov_df, -1)
# Arredondando decimais para duas casas
ibov_df$IBOV <- round(ibov_df$IBOV, 2)
# Transformando em Tibble Data Frame
ibov_df <- as_tibble(ibov_df)The variation was obtained using the formula: \[Variation = \dfrac {Later Value- Previous Value}{Previous Value} \times 100\]
After standardization, we have:
# Renomeando coluna
names(ipca_df) <- c("timestamp", "IPCA")
# Padronizando formatação da datação
ipca_df$timestamp <- as.Date(ipca_df$timestamp, format = "%d/%m/%Y")
# Alterando classe <cha> para <dbl>
ipca_df$IPCA <- as.numeric(ipca_df$IPCA)
# Transformando em Tibble Data Frame
ipca_df <- as_tibble(ipca_df)After standardization, we have:
# Renomeando coluna
names(selic_df) <- c("timestamp", "Selic")
# Padronizando formatação da datação
selic_df$timestamp <- as.Date(selic_df$timestamp, format = "%d/%m/%Y")
# Alterando classe <cha> para <dbl>
selic_df$Selic <- as.numeric(selic_df$Selic)
# Transformando em Tibble Data Frame
selic_df <- as_tibble(selic_df)After standardization, we have:
combined_df <- ifix_df %>%
left_join(ibov_df, by = "timestamp") %>%
left_join(ipca_df, by = "timestamp") %>%
left_join(selic_df, by = "timestamp")
combined_df## [1] NA NA NA
## [1] 0.60 1.41 0.25 0.52 -0.06 0.62 0.25 0.08 NA NA
# Exclusão das duas últimas linha
df_sem_ultima_linha <- combined_df[1:(nrow(combined_df) - 2), ]
tail(df_sem_ultima_linha$IPCA,10)## [1] 0.13 0.44 0.60 1.41 0.25 0.52 -0.06 0.62 0.25 0.08
The data, now standardized, is displayed graphically below. The variation graph only displays the fluctuations and offers little information in isolation. Soon after, through an accumulated sum of variations, a graph better representing the interval was created.
The graphs are interactive, it is possible to isolate periods by clicking, holding and dragging through the desired intervals. Or leave the mouse over the month of interest so that details are displayed on a label that will appear.
Click above to select which one you want to view.
plot_var <- plot_ly(data = combined_df, x = ~timestamp, type = "scatter",
y = ~IFIX, mode = "lines", name = "IFIX") %>%
add_trace(y = ~IBOV, mode = "lines", name = "IBOV") %>%
add_trace(y = ~IPCA, mode = "lines", name = "IPCA") %>%
add_trace(y = ~Selic, mode = "lines", name = "Selic") %>%
layout(title = "Variações do Período",
font = list(color = "white"),
xaxis = list(title = "Data", nticks = 9, gridcolor = "#303030"),
yaxis = list(title = "Variação (valores em porcentagem)", gridcolor = "#303030"),
hovermode = "x unified",
paper_bgcolor = "#222222",
plot_bgcolor = "#222222",
colorway = c("#ff7f0e","#17becf","#d62728","#33a02c")
)
plot_var# Soma acumulada das variações
acum_df <- combined_df
acum_df$IFIX <- cumsum(acum_df$IFIX)
acum_df$IBOV <- cumsum(acum_df$IBOV)
acum_df$IPCA <- cumsum(acum_df$IPCA)
acum_df$Selic <- cumsum(acum_df$Selic)plot_acum <- plot_ly(data = acum_df, x = ~timestamp, type = "scatter",
y = ~IFIX, mode = "lines", name = "IFIX") %>%
add_trace(y = ~IBOV, mode = "lines", name = "IBOV") %>%
add_trace(y = ~IPCA, mode = "lines", name = "IPCA") %>%
add_trace(y = ~Selic, mode = "lines", name = "Selic") %>%
layout(title = "Variação Acumulada do Período",
font = list(color = "white"),
xaxis = list(title = "Data", nticks = 9, gridcolor = "#303030"),
yaxis = list(title = "Rendimento (valores em porcentagem)", gridcolor = "#303030"),
hovermode = "x unified",
paper_bgcolor = "#222222",
plot_bgcolor = "#222222",
colorway = c("#ff7f0e","#17becf","#d62728","#33a02c")
)
plot_acumRead about the Graphical Analysis carried out by clicking
the button:
During the period, IFIX showed considerable volatility
(minimum of -15 to maximum of 10), perhaps it
was so great due to government changes (sudden and dichotomous moments
of high/low Selic) and mainly the pandemic. IBOV is known to be a
volatile index as it represents shares, however, what was stated about
IFIX possibly applies here as well.
The IFIX average in this period was a surprise, as it is
higher than the IBOV average. In this way, proving to be preferable in
terms of “profitability”, especially if considering the issue of
security, where the fluctuation of the IFIX is lower than the IBOV.
Additionally, this argument is supported by the data presented in the
quartiles. That is, lower volatility, supposedly greater
“security”. Although such statements about the analysis are limited, due
to the monthly interval, this issue was better addressed in the topic
Final Considerations.
The minimum of the IPCA being negative implies deflation
and, objectively, a possible moment of overcoming a crisis or moment of
great uncertainty/variability.
In terms of sampling and representativeness, there are two diametrically opposed positions to reflect on:
## IFIX IBOV IPCA Selic
## Min. :-15.8500 Min. :-29.970 Min. :-0.4700 Min. :0.1300
## 1st Qu.: -0.8650 1st Qu.: -3.120 1st Qu.: 0.1950 1st Qu.:0.4700
## Median : 1.0300 Median : 0.840 Median : 0.3900 Median :0.7300
## Mean : 0.9073 Mean : 1.099 Mean : 0.4062 Mean :0.7154
## 3rd Qu.: 2.6150 3rd Qu.: 6.035 3rd Qu.: 0.6200 3rd Qu.:1.0550
## Max. : 10.6300 Max. : 16.960 Max. : 1.4100 Max. :1.2200
To better analyze the behavior of these indexes, a correlation analysis (Pearson Correlation) was carried out next.
To carry out the correlation calculation, a period of twelve months
will first be considered. That is, “in the last 12 months the rate of
index x was y…”.
Read about the Correlation Analysis performed by
clicking the button:
Defining data frame with the accumulated sum of the last year. Obviously the values for the first months only consider themselves, so the first months have values lower than the 12-month interval. But this does not affect this calculation, as it only considers the interaction between them.
# Data frame contendo valores do ultimo ano.
l12df <- combined_df %>%
mutate(across(c(IFIX, IBOV, IPCA, Selic),
~ cumsum(.) - lag(cumsum(.), n = 12, default = 0)))
l12dfTherefore, below is the graphical representation of the accumulated moving variation for the period:
plot_last12 <- plot_ly(data = l12df, x = ~timestamp, type = "scatter",
y = ~IFIX, mode = "lines", name = "IFIX") %>%
add_trace(y = ~IBOV, mode = "lines", name = "IBOV") %>%
add_trace(y = ~IPCA, mode = "lines", name = "IPCA") %>%
add_trace(y = ~Selic, mode = "lines", name = "Selic") %>%
layout(title = "Variação Acumulada Móvel com Intervalo de 12 meses",
font = list(color = "white"),
xaxis = list(title = "Data", nticks = 9, gridcolor = "#303030"),
yaxis = list(title = "Variação (valores em porcentagem)", gridcolor = "#303030"),
hovermode = "x unified",
paper_bgcolor = "#222222",
plot_bgcolor = "#222222",
colorway = c("#ff7f0e","#17becf","#d62728","#33a02c")
)
plot_last12The graph above apparently tells a story, especially if we focus on the Selic indices. If we observe, at first, right after a stabilization or fall of the Selic, the IFIX and IBOV responds accordenly (period from 2017 to 2022). On the other hand, in a high movement of the Selic (period from 2015 to 2016 and again from 2022 to 2023) there was a depreciation of the IFIX and IBOV.
Although with the graph above it is possible to hypothesize the previous statement, any minimally studied investor has certainly already concluded the same. The magnitude of this correlation was calculated below.
# Excluindo as duas últimas linhas devido dado faltante do IPCA.
l12df_sem_ultima_linha <- l12df[1:(nrow(l12df) - 2), ]
# Realizando cálculo da correlação
l12cor <- cor(l12df_sem_ultima_linha[, c("IFIX", "IBOV", "IPCA", "Selic")])
l12cor## IFIX IBOV IPCA Selic
## IFIX 1.0000000 0.63192296 0.1043657 0.44200095
## IBOV 0.6319230 1.00000000 -0.2823617 -0.02893405
## IPCA 0.1043657 -0.28236169 1.0000000 0.81675913
## Selic 0.4420010 -0.02893405 0.8167591 1.00000000
heatmap_l12 <- plot_ly(
type = "heatmap",
colorscale = "Portland",
z = l12cor,
x = colnames(l12cor),
y = rownames(l12cor),
zmin = -1,
zmax = 1,
reversescale = TRUE
) %>% layout(
title = "Matriz de Correlação",
font = list(color = '#FFFFFF'),
paper_bgcolor = "#222222",
showlegend = FALSE
)
# Exibindo anotações no gráfico onde valor diferente de 1.
for (nr in 1:nrow(l12cor)) {
for (nc in 1:ncol(l12cor)) {
if (l12cor[nr, nc] != 1) {
heatmap_l12 <- heatmap_l12 %>%
add_annotations(
text = round(l12cor[nr, nc], 6),
x = colnames(l12cor)[nc],
y = rownames(l12cor)[nr],
showarrow = FALSE,
font = list(size = 14, color = "white"))}}}
heatmap_l12The relationship IFIX x IBOV (0.64)
exhibits a moderate and positive correlation. Suggesting that they tend
to move in the same direction together, although the shape is not
perfectly linear.
The IPCA x Selic relationship (0.82)
exhibits a high and positive correlation. This is expected since one of
Selic’s functions is precisely to control inflation.
Mind the selected sample period (from 2015 to 2023). Which is possibly small, especially the (monthly) interval, this may have caused this apparent “lack of expressiveness” in the calculated correlations.
To display the analysis click on the button:
Checking data:
## [1] 2.69 -0.27 -1.62 3.68 1.47 3.03 0.70 -0.87 -3.94 2.13
## [11] 1.58 -3.01 -6.11 2.88 9.16 4.61 3.74 1.60 5.97 1.80
## [21] 2.77 3.85 -2.59 1.50 3.75 4.87 0.19 0.15 1.03 0.84
## [31] -0.38 0.87 6.57 0.23 -0.59 0.60 2.64 1.14 2.00 -0.86
## [41] -5.27 -4.02 1.39 -0.71 -0.22 5.04 2.59 2.22 2.47 1.03
## [51] 1.99 1.03 1.76 2.87 1.28 -0.11 1.04 4.01 3.52 10.63
## [61] -3.76 -3.69 -15.85 4.39 2.08 5.59 -2.60 1.79 0.46 -1.01
## [71] 1.52 2.18 0.32 0.25 -1.38 0.51 -1.54 -2.20 2.51 -2.63
## [81] -1.24 -1.47 -3.64 8.78 -0.99 -1.29 1.42 1.19 0.26 -0.88
## [91] 0.66 5.76 0.49 0.02 -4.15 0.00 -1.60 -0.45 -1.69 3.52
## [101] 5.43 4.71 1.33
## Warning in adf.test(df_sem_ultima_linha$IFIX): p-value smaller than printed
## p-value
##
## Augmented Dickey-Fuller Test
##
## data: df_sem_ultima_linha$IFIX
## Dickey-Fuller = -4.736, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
Using ADF Test we have:
As the p-value is less than 0.05, we reject the null hypothesis (H0). Suggesting that it is a series is stationary.
To display the analysis click on the button:
Checking better parameters for ARIMA.
## Series: df_sem_ultima_linha$IFIX
## ARIMA(0,0,3) with non-zero mean
##
## Coefficients:
## ma1 ma2 ma3 mean
## 0.1813 0.0472 -0.3458 0.8961
## s.e. 0.0908 0.0946 0.1015 0.2772
##
## sigma^2 = 10.38: log likelihood = -264.83
## AIC=539.65 AICc=540.27 BIC=552.83
Resulting in 0,0,3 (p,d,q or
AR,I,MA).
To display the analysis click on the button:
With the previous parameters suggested, the following model was obtained:
##
## Call:
## arima(x = combined_df$IFIX, order = c(0, 0, 3))
##
## Coefficients:
## ma1 ma2 ma3 intercept
## 0.1820 0.0480 -0.3443 0.9009
## s.e. 0.0899 0.0937 0.1003 0.2727
##
## sigma^2 estimated as 9.788: log likelihood = -268.97, aic = 547.95
To display the analysis click on the button:
Checking if the parameters are adequate. That is, whether the residuals correlate with the parameters. If they correlate, it means that it is not appropriate, as such residues should be used in the forecast. In other words, it tests the bias of the parameters.
## Time Series:
## Start = 1
## End = 105
## Frequency = 1
## [1] 1.66547441 -1.35698763 -2.14142330 3.64430149 -0.43570629
## [6] 1.37393885 0.81280871 -2.12755146 -4.01777475 2.33561207
## [11] -0.28582471 -5.35094132 -5.21961199 3.08706585 6.10481560
## [16] 0.65275810 3.49044081 2.13449718 4.73789792 1.13613881
## [21] 2.17002242 4.13104095 -3.95577357 1.86831360 4.12127016
## [26] 1.76712332 -0.58692606 0.69033111 0.64011316 -0.41261261
## [31] -0.99874957 0.39116190 5.50375867 -2.03546477 -1.24965296
## [36] 1.91944034 0.74876706 -0.41955879 1.80053521 -1.81068311
## [41] -6.07208751 -3.10862022 0.72283569 -3.68419514 -1.55528121
## [46] 4.84790304 -0.38740500 0.62155062 3.14390137 -0.60640876
## [51] 1.26273494 1.01092628 0.40571532 2.28159249 0.29242580
## [56] -1.03384677 1.09895773 3.05936613 1.65348199 9.65978212
## [61] -5.44522311 -3.49363084 -12.52739687 4.06225113 -0.16241213
## [66] 0.21013286 -2.13252799 1.21134271 -0.48672410 -2.61468675
## [71] 1.53558607 0.95742304 -1.72916615 0.14675470 -1.89494990
## [76] -0.64836403 -2.18139688 -3.32516111 2.09584665 -4.50403139
## [81] -2.56646469 -0.96590460 -5.79283283 8.09628106 -3.41945072
## [86] -3.95147701 4.19039407 -1.46159776 -1.93646577 0.08469577
## [91] -0.66667577 4.30963171 -1.13426066 -1.11068402 -3.31027712
## [96] -0.63554086 -2.60882323 -1.98532103 -2.32314292 2.23896315
## [101] 3.54936359 2.25563565 0.61920833 0.59039642 -0.06133984
Considering that the
lag’s (the “peaks” in the graph) are
within the confidence band (horizontal blue dashed line), it suggests
that the residuals do not correlate with the model. That is, the
realized model captures the patterns in the data.
## [1] -0.007488662
In this step, it was checked that (on average) the forecast errors are not biased, above or below the real value. If the residuals were different from 0 it would suggest a bias in the model. Thus ensuring “white noise” behavior.
Checking normal distribution of the residue:
The “bell-shape” of a normal distribution is desirable, suggesting there is no bias in the residual.
Previous tests respectively suggest that:
Adversely we have box.test:
##
## Box-Pierce test
##
## data: res_ifix_mdl
## X-squared = 1.7771, df = 10, p-value = 0.9978
The p-value of 99% indicates that there is no statistically significant evidence to reject the null hypothesis (H0: There is no autocorrelation in the lags considered). That is, they exhibit statistical independence and the model adequately captures the patterns.
To display the analysis click on the button:
In this topic was sought to predict the year 2023 until September, and analyze this possibility. But first, the prediction made below in topic 3.5.2 was used as a comparison metric, which will attempt to predict the next two years from the interval between 01/2015 and 09/2023
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 106 0.7048021 -3.304586 4.714190 -5.427028 6.836632
## 107 0.6946248 -3.380657 4.769907 -5.537981 6.927231
## 108 0.9219869 -3.157831 5.001805 -5.317557 7.161530
## 109 0.9008650 -3.406218 5.207948 -5.686250 7.487980
## 110 0.9008650 -3.406218 5.207948 -5.686250 7.487980
## 111 0.9008650 -3.406218 5.207948 -5.686250 7.487980
## 112 0.9008650 -3.406218 5.207948 -5.686250 7.487980
## 113 0.9008650 -3.406218 5.207948 -5.686250 7.487980
## 114 0.9008650 -3.406218 5.207948 -5.686250 7.487980
## 115 0.9008650 -3.406218 5.207948 -5.686250 7.487980
## 116 0.9008650 -3.406218 5.207948 -5.686250 7.487980
## 117 0.9008650 -3.406218 5.207948 -5.686250 7.487980
## 118 0.9008650 -3.406218 5.207948 -5.686250 7.487980
## 119 0.9008650 -3.406218 5.207948 -5.686250 7.487980
## 120 0.9008650 -3.406218 5.207948 -5.686250 7.487980
## 121 0.9008650 -3.406218 5.207948 -5.686250 7.487980
## 122 0.9008650 -3.406218 5.207948 -5.686250 7.487980
## 123 0.9008650 -3.406218 5.207948 -5.686250 7.487980
## 124 0.9008650 -3.406218 5.207948 -5.686250 7.487980
## 125 0.9008650 -3.406218 5.207948 -5.686250 7.487980
## 126 0.9008650 -3.406218 5.207948 -5.686250 7.487980
## 127 0.9008650 -3.406218 5.207948 -5.686250 7.487980
## 128 0.9008650 -3.406218 5.207948 -5.686250 7.487980
## 129 0.9008650 -3.406218 5.207948 -5.686250 7.487980
When forecasting 24 upcoming months, we have the following columns:
Point Forecast forecast.Lo80 minimum and Hi 80
maximum.Lo 95 minimum and Hi 95
maximum.At this point we will address the contrast with the already present and concrete data from the year 2023 itself. This can be done by removing it from the sample or Out-of-Sample.
The reasoning is simple: If we were in December 2022, what would the 2023 IFIX be like using this model?
ifix15_22 <- arima(combined_df$IFIX[1:96], order = c(0,0,3))
forecast15_22 <- forecast(ifix15_22, h = 9)
forecast15_22## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 97 0.9986895 -3.112424 5.109803 -5.288717 7.286096
## 98 1.9862036 -2.179274 6.151681 -4.384345 8.356752
## 99 1.1037420 -3.064463 5.271947 -5.270978 7.478462
## 100 0.8790011 -3.514623 5.272625 -5.840467 7.598469
## 101 0.8790011 -3.514623 5.272625 -5.840467 7.598469
## 102 0.8790011 -3.514623 5.272625 -5.840467 7.598469
## 103 0.8790011 -3.514623 5.272625 -5.840467 7.598469
## 104 0.8790011 -3.514623 5.272625 -5.840467 7.598469
## 105 0.8790011 -3.514623 5.272625 -5.840467 7.598469
Evaluating accuracy:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.007273949 3.207919 2.397492 Inf Inf 0.7728859 -0.01321303
O MAPE (Erro Percentual Absoluto Médio) sugere que, em média, as previsões têm um erro percentual absoluto médio de aproximadamente 198.12%. Um erro tão grande
Já o modelo até 09/2023.
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.007488662 3.128542 2.353064 Inf Inf 0.7868008 -0.0069888
The worsening of the 2022 model may have occurred due to an election year with large non-recurring fluctuations.
Standardizing data from forecast 2015~2022:
# Extraindo valores do forecast:
forecastibble <- tibble(
Point_Forecast = forecast15_22$mean,
Lo_80 = forecast15_22$lower[, "80%"],
Hi_80 = forecast15_22$upper[, "80%"],
Lo_95 = forecast15_22$lower[, "95%"],
Hi_95 = forecast15_22$upper[, "95%"])
# Criando linhas e adequando
b_row <- tibble(
Point_Forecast = rep(NA, 96),
Lo_80 = rep(NA, 96),
Hi_80 = rep(NA, 96),
Lo_95 = rep(NA, 96)
)
forecastibble <- bind_rows(b_row, forecastibble)
tail(forecastibble,10)Standardizing information and consolidating it to allow visualization:
# Recriando data frame com datas e o ifix
ifix_pred <- subset(combined_df, select = c(timestamp, IFIX))
# Unindo o data frame com predições do ano de 2023
ifix_pred <- merge(ifix_pred, forecastibble, by = 0)
# Removendo index
ifix_pred$Row.names <- NULL
# Reorganizando em ordem crescente
ifix_pred <- ifix_pred %>%
arrange(as.Date(timestamp))
# Separando em dois dataframes para poder replicar valores do ifix até 12/2022
ifix_pred_slice_top <- slice(ifix_pred, 1:96)
ifix_pred_slice_bottom <- slice(ifix_pred, 97:105)
# Replicando valores de IFIX para demais colunas
ifix_pred_slice_top <- ifix_pred_slice_top %>%
mutate(
Point_Forecast = ifix_pred_slice_top$IFIX,
Lo_80 = ifix_pred_slice_top$IFIX,
Hi_80 = ifix_pred_slice_top$IFIX,
Lo_95 = ifix_pred_slice_top$IFIX,
Hi_95 = ifix_pred_slice_top$IFIX
)
# Unindo metades
ifix_pred <- rbind(ifix_pred_slice_top, ifix_pred_slice_bottom)
# Soma acumulada das variações
ifix_pred$IFIX <- cumsum(ifix_pred$IFIX)
ifix_pred$Prd <- round(cumsum(ifix_pred$Point_Forecast),2)
ifix_pred$L80 <- round(cumsum(ifix_pred$Lo_80),2)
ifix_pred$H80 <- round(cumsum(ifix_pred$Hi_80),2)
ifix_pred$L95 <- round(cumsum(ifix_pred$Lo_95),2)
ifix_pred$H95 <- round(cumsum(ifix_pred$Hi_95),2)
ifix_pred <- subset(ifix_pred, select = c(timestamp, IFIX, Prd, L80, H80, L95, H95))
ifix_predAnalisando previsão de forma gráfica:
plt_pred_ifix <- plot_ly(data = ifix_pred, x = ~timestamp, type = "scatter",
y = ~IFIX, mode = "lines", name = "IFIX") %>%
add_trace(y = ~Prd, mode = "lines", name = "Predito") %>%
add_trace(y = ~L80, mode = "lines", name = "Menor 80%") %>%
add_trace(y = ~H80, mode = "lines", name = "Maior 80%") %>%
add_trace(y = ~L95, mode = "lines", name = "Menor 95%") %>%
add_trace(y = ~H95, mode = "lines", name = "Maior 95%") %>%
add_ribbons(ymin = ~L80, ymax = ~H80, color = I("rgba(51, 160, 44, 0.3)"),
name = "Intervalo 80%", showlegend = FALSE) %>%
add_ribbons(ymin = ~L95, ymax = ~H95, color = I("rgba(214, 39, 40, 0.3)"),
name = "Intervalo 95%", showlegend = FALSE) %>%
layout(title = "Predição IFIX 2023",
font = list(color = "white"),
xaxis = list(title = "Data", nticks = 9, gridcolor = "#303030"),
yaxis = list(title = "Rendimento (valores em porcentagem)", gridcolor = "#303030"),
hovermode = "x unified",
paper_bgcolor = "#222222",
plot_bgcolor = "#222222",
colorway = c("#ff7f0e","#17becf","#33a02c",
"#33a02c","#d62728","#d62728","#d62728")
)
plt_pred_ifixAnalyzing predicted graphically exclusively in the proposed period:
# Filtrando para iniciar a partir de dez/2022
ifix_pred_filter <- ifix_pred %>%
filter(row_number() >= 96)
# Criando o gráfico
f_plt_pred_ifix <- plot_ly(data = ifix_pred_filter, x = ~timestamp, type = "scatter",
y = ~IFIX, mode = "lines", name = "IFIX") %>%
add_trace(y = ~Prd, mode = "lines", name = "Predito") %>%
add_trace(y = ~L80, mode = "lines", name = "Menor 80%") %>%
add_trace(y = ~H80, mode = "lines", name = "Maior 80%") %>%
add_trace(y = ~L95, mode = "lines", name = "Menor 95%") %>%
add_trace(y = ~H95, mode = "lines", name = "Maior 95%") %>%
add_ribbons(ymin = ~L80, ymax = ~H80, color = I("rgba(51, 160, 44, 0.3)"),
name = "Intervalo 80%", showlegend = FALSE) %>%
add_ribbons(ymin = ~L95, ymax = ~H95, color = I("rgba(214, 39, 40, 0.3)"),
name = "Intervalo 95%", showlegend = FALSE) %>%
layout(title = "Predição IFIX 2023",
font = list(color = "white"),
xaxis = list(title = "Data", nticks = 1, gridcolor = "#303030"),
yaxis = list(title = "Rendimento (valores em porcentagem)", gridcolor = "#303030"),
hovermode = "x unified",
paper_bgcolor = "#222222",
plot_bgcolor = "#222222",
colorway = c("#ff7f0e","#17becf","#33a02c",
"#33a02c","#d62728","#d62728","#d62728")
)
f_plt_pred_ifixHow did the model developed here, in the interval between 01/2015 and 12/2022, predict the year 2023 until the month of September?
The variation is great because the further “in the future”, the greater the chance of error, in other words, the error occurs by “accumulation”. Another major factor, which is hypothesized, is that the 2022 elections may have contributed to a greater non-recurring fluctuation in the observed period.
In the interval between 01/2015 and 09/2023, how did the model developed here predict the interval from 10/2023 to 09/2025?
# Repetindo código para gerar predições
forecast_ifix <- forecast(ifix_mdl, h = 24)
# Extraindo valores do forecast:
forecast_ifix <- tibble(
Point_Forecast = forecast_ifix$mean,
Lo_80 = forecast_ifix$lower[, "80%"],
Hi_80 = forecast_ifix$upper[, "80%"],
Lo_95 = forecast_ifix$lower[, "95%"],
Hi_95 = forecast_ifix$upper[, "95%"])
forecast_ifix <- forecast_ifix %>%
mutate(timestamp = seq.Date(from = as.Date("2023-10-01"),
to = as.Date("2025-09-01"),
by = "month"))
# Recriando data frame com datas e o ifix
forecast_ifix_pred <- subset(combined_df, select = c(timestamp, IFIX))
# Replicando valores de IFIX para demais colunas
forecast_ifix_pred <- forecast_ifix_pred %>%
mutate(
Lo_80 = forecast_ifix_pred$IFIX,
Hi_80 = forecast_ifix_pred$IFIX,
Lo_95 = forecast_ifix_pred$IFIX,
Hi_95 = forecast_ifix_pred$IFIX
)
# Renomeando coluna para compatibilidade
forecast_ifix <- forecast_ifix %>%
rename(IFIX = Point_Forecast)
# Consolidando dados
forecast_ifix_pred <- rbind(forecast_ifix_pred, forecast_ifix)
# Soma acumulada das variações
forecast_ifix_pred$IFIX <- round(cumsum(forecast_ifix_pred$IFIX),2)
forecast_ifix_pred$L80 <- round(cumsum(forecast_ifix_pred$Lo_80),2)
forecast_ifix_pred$H80 <- round(cumsum(forecast_ifix_pred$Hi_80),2)
forecast_ifix_pred$L95 <- round(cumsum(forecast_ifix_pred$Lo_95),2)
forecast_ifix_pred$H95 <- round(cumsum(forecast_ifix_pred$Hi_95),2)
forecast_ifix_pred <- subset(forecast_ifix_pred, select = c(timestamp, IFIX, L80, H80, L95, H95))
forecast_ifix_predAnalyzing forecast graphically:
plt_pred_25 <- plot_ly(data = forecast_ifix_pred, x = ~timestamp, type = "scatter",
y = ~IFIX, mode = "lines", name = "IFIX") %>%
add_trace(y = ~L80, mode = "lines", name = "Menor 80%") %>%
add_trace(y = ~H80, mode = "lines", name = "Maior 80%") %>%
add_trace(y = ~L95, mode = "lines", name = "Menor 95%") %>%
add_trace(y = ~H95, mode = "lines", name = "Maior 95%") %>%
add_ribbons(ymin = ~L80, ymax = ~H80, color = I("rgba(51, 160, 44, 0.3)"),
name = "Intervalo 80%", showlegend = FALSE) %>%
add_ribbons(ymin = ~L95, ymax = ~H95, color = I("rgba(214, 39, 40, 0.3)"),
name = "Intervalo 95%", showlegend = FALSE) %>%
layout(title = "Predição IFIX 2025",
font = list(color = "white"),
xaxis = list(title = "Data", nticks = 11, gridcolor = "#303030"),
yaxis = list(title = "Rendimento (valores em porcentagem)", gridcolor = "#303030"),
hovermode = "x unified",
paper_bgcolor = "#222222",
plot_bgcolor = "#222222",
colorway = c("#ff7f0e","#17becf","#33a02c",
"#33a02c","#d62728","#d62728","#d62728")
)
plt_pred_25Analyzing predicted graphically exclusively in the proposed period:
# Filtrando para iniciar a partir de dez/2022
ifix_pred_filter_25 <- forecast_ifix_pred %>%
filter(row_number() >= 105)
# Criando o gráfico
f_plt_pred_ifix_25 <- plot_ly(data = ifix_pred_filter_25, x = ~timestamp, type = "scatter", y = ~IFIX, mode = "lines", name = "IFIX") %>%
add_trace(y = ~L80, mode = "lines", name = "Menor 80%") %>%
add_trace(y = ~H80, mode = "lines", name = "Maior 80%") %>%
add_trace(y = ~L95, mode = "lines", name = "Menor 95%") %>%
add_trace(y = ~H95, mode = "lines", name = "Maior 95%") %>%
add_ribbons(ymin = ~L80, ymax = ~H80, color = I("rgba(51, 160, 44, 0.3)"),
name = "Intervalo 80%", showlegend = FALSE) %>%
add_ribbons(ymin = ~L95, ymax = ~H95, color = I("rgba(214, 39, 40, 0.3)"),
name = "Intervalo 95%", showlegend = FALSE) %>%
layout(title = "Predição IFIX 2025",
font = list(color = "white"),
xaxis = list(title = "Data", nticks = 4, gridcolor = "#303030"),
yaxis = list(title = "Rendimento (valores em porcentagem)", gridcolor = "#303030"),
hovermode = "x unified",
paper_bgcolor = "#222222",
plot_bgcolor = "#222222",
colorway = c("#ff7f0e","#17becf","#33a02c",
"#33a02c","#d62728","#d62728","#d62728"))
f_plt_pred_ifix_25It is assumed that a good investment is made at the appropriate place, time and value. This topic proposed reflect on the interaction of these three variables, problems and possible implicit solutions.
The present study aimed to analyze the IFIX index, its correlations and the possibility of predicting trends based on data from January 2015 to September 2023. It analyzed correlations with other market indexes, such as IBOV, Selic and IPCA. Firstly, looking for correlations and then projecting trends.
A large part of what is understood as “profit” comes from the premise of the current moment. In the case of investments, it can be better expressed by “anticipating” this moment. The present work did not analyze the almost infinite myriad of possibilities that impact this so-called “moment”. For example, COPOM (Monetary Policy Committee) meetings that dictate the Selic rate forecast.
Briefly, a decrease in the Selic graph (exposed in the study), months before, already had an impact on the IFIX and IBOV variable income indexes. In other words, abrupt drops “today” in variable income could perhaps be explained by high rates predicted by COPOM in the “coming months”.
Furthermore, considering the sampling window (of seven years) and its interval (in months), it is concluded that it has limitations in detriment of these variables. Additionally, the context of the last few troubled years of pandemic, impeachment, changes of government and the amplitude of the Selic rate must be considered. A larger window and smaller intervals would potentially encompass this intrinsic connection more adequately.
The intrinsic connection is possible to be seen in this variable. Like at the end of the 2022 elections when trying to predict 2023 values (topic 3.5.1). In other words, money only has value in time.
Finally, it is worth stating that the present work fulfilled the objective of analyzing it on an experimental and speculative basis. And, in a way, it can be seen as a sketch of what “can be done”. This is just a mere didactic exercise in order to explore and display a repertoire on the contents covered here.
Indexes IFIX e IBOV:
TvDataFeed (More details: https://github.com/rongardF/tvdatafeed )
Indexes IPCA e Selic
BACEN (More details: https://www3.bcb.gov.br/sgspub/ )
IPCA https://dadosabertos.bcb.gov.br/dataset/10844-indice-de-precos-ao-consumidor-amplo-ipca---servicos
Selic https://dadosabertos.bcb.gov.br/dataset/4390-taxa-de-juros---selic-acumulada-no-mes